-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve index names in indices_boost #21393
Conversation
ping @javanna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, looks good though, sorry it took me ages to get to this ;)
} | ||
|
||
List<Tuple<String, Float>> concreteIndexBoost = new ArrayList<>(); | ||
for (Tuple<String, Float> ib : indexBoost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you save ib.v1() and ib.v2() to a variable so it is more readable what they hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
I didn't like tuple since it's not very readable as you pointed out. So, I decided to create a new class.
List<Tuple<String, Float>> concreteIndexBoost = new ArrayList<>(); | ||
for (Tuple<String, Float> ib : indexBoost) { | ||
String[] concreteIndices = | ||
indexNameExpressionResolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandOpen(), ib.v1()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why hardcoding lenient indices options? maybe use strict? or even better, why not applying the options from the search request so we are consistent with how indices are resolved in the url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
But I'm wondering if it makes this change breaking. Specifying index that doesn't exist used to be no op. Now returns error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I think it's ok if it is configurable. and then it should do by default the same as the rest of the same search request does.
return this; | ||
} | ||
|
||
/** | ||
* Gets the boost a specific indices will receive when the query is | ||
* executed against them. | ||
*/ | ||
public ObjectFloatHashMap<String> indexBoost() { | ||
public List<Tuple<String, Float>> indexBoost() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very small change, it's a getter that is probably not used (I assume most people just set the index boost), but it is a breaking change for the java api. Not a huge deal though, shall we add the breaking-java label to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
searchRequest.source().indexBoost(concreteIndexBoost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it odd that we modify the original source builder with the resolved indices names and replace what we originally had. Would it be possible to transport the resolved indices differently? I think we should serialize this new info separately and carefully handle bw compatibility around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param indexBoost | ||
* The boosts to apply to the indices | ||
*/ | ||
public SearchSourceBuilder indexBoost(List<Tuple<String, Float>> indexBoost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above, ideally this setter wouldn't be needed, as it's here only because we need it internally to replace the provided indices with their resolved names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
logger.info("Query with alias_test1 boosted"); | ||
SearchResponse response = client().search(searchRequest() | ||
.searchType(SearchType.QUERY_THEN_FETCH) | ||
.source(searchSource().explain(true).indexBoost("alias1", indexBoost).query(termQuery("test", "value"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was removed in favor of yaml tests.
|
||
logger.info("--- DFS_QUERY_THEN_FETCH"); | ||
public void testIndicesBoostWithAlias() throws Exception { | ||
createIndex("alias_test1", "alias_test2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call the concrete indices "index1" and "index2", otherwise one may think they are aliases :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was removed in favor of yaml tests.
|
||
float indexBoost = 1.1f; | ||
|
||
logger.info("--- QUERY_THEN_FETCH"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was removed in favor of yaml tests.
.searchType(SearchType.QUERY_THEN_FETCH) | ||
.source(searchSource() | ||
.explain(true) | ||
.indexBoost("non_existent", boost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you used lenient indices options, so that nothing breaks if non existing indices are specified. But maybe that should be configurable like I commented above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
assertThat(response.getHits().totalHits(), equalTo(1L)); | ||
logger.info("Hit[0] {} Explanation {}", response.getHits().getAt(0).index(), response.getHits().getAt(0).explanation()); | ||
assertThat(response.getHits().getAt(0).index(), equalTo("test1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that you added yaml tests, I wonder if these integ tests are still needed. Maybe we can convert them all to yaml tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
0f62fa9
to
e036efa
Compare
} | ||
builder.endObject(); | ||
builder.endArray(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonders if I should put add concreteIndexBoosts here. Since it's used only internally, it's not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think we should print out what the user sent, not our own resolved objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the clarification.
Thanks @javanna ! Addressed your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @masaruh for addressing the comments, I left some more, but it's really close.
@@ -107,6 +107,10 @@ protected void doExecute(Task task, SearchRequest searchRequest, ActionListener< | |||
searchRequest.preference()); | |||
failIfOverShardCountLimit(clusterService, shardIterators.size()); | |||
|
|||
if (searchRequest.source() != null) { | |||
searchRequest.source().resolveIndexBoosts(indexNameExpressionResolver, clusterState, searchRequest.indicesOptions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the resolution method out of SearchSourceBuilder
too and also transport the info out of it as well? Like we do with aliasFilters map for instance? Does it make sense? Trying to keep SearchSourceBuilder
as POJO as possible and have logic outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
* @param indexBoost | ||
* The boost to apply to the index | ||
*/ | ||
public SearchSourceBuilder indexBoost(String index, float indexBoost) { | ||
if (this.indexBoost == null) { | ||
this.indexBoost = new ObjectFloatHashMap<>(); | ||
if (index == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.requireNonNull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
boost = in.readFloat(); | ||
} | ||
|
||
public IndexBoost(QueryParseContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the other constructors too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
public String index() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the get prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return index; | ||
} | ||
|
||
public float boost() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4c9a13c
to
7e96f67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this @masaruh thanks for addressing my comments, I left some new ones but mainly minors. I promise it's the last changes I'm asking you to make ;)
out.writeVInt(indexBoostSize); | ||
if (indexBoostSize > 0) { | ||
writeIndexBoost(out); | ||
if (indexBoosts == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we initialize indexBoosts to new ArrayList<>() straight-away instead and remove this null invariant? This if would go away too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
indexBoost.put(in.readString(), in.readFloat()); | ||
} | ||
List<IndexBoost> boosts = in.readList(IndexBoost::new); | ||
if (boosts.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can unconditionally assign indexBoosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
I was trying to preserve state where indexBoosts isn't set (null). But if we initialize indexBoosts with empty array, no need to worry.
Objects.requireNonNull(index); | ||
|
||
if (this.indexBoosts == null) { | ||
this.indexBoosts = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the list is already initialized like I suggested above, this if goes away too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
} else if (context.getParseFieldMatcher().match(currentFieldName, INDICES_BOOST_FIELD)) { | ||
indexBoosts = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than re-initializing the list, shall we enforce the fact that users cannot specify boost in both ways? Either deprecated or new syntax, not both at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -167,6 +175,9 @@ protected void innerReadFrom(StreamInput in) throws IOException { | |||
source = in.readOptionalWriteable(SearchSourceBuilder::new); | |||
types = in.readStringArray(); | |||
aliasFilter = new AliasFilter(in); | |||
if (in.getVersion().onOrAfter(Version.V_5_2_0_UNRELEASED)) { | |||
this.indexBoost = in.readFloat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should be more explicit and initialize indexBoost in an else branch here, rather than above when declaring it. I think de-serialization is the only scenario when it may not get assigned.
@@ -369,3 +369,18 @@ buildRestTests.setups['range_index'] = ''' | |||
body: | | |||
{"index":{"_id": 1}} | |||
{"expected_attendees": {"gte": 10, "lte": 20}, "time_frame": {"gte": "2015-10-31 12:00:00", "lte": "2015-11-01"}}''' | |||
|
|||
// Used by index boost doc | |||
buildRestTests.setups['index_boost'] = ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool stuff I had no idea you could do this ;)
index: [test_1, test_2] | ||
|
||
--- | ||
"Indices boost using object": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add a skip section to every test relying on warnings, or the runners not supporting the "warnings" feature will break when trying to run this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Map<String, Float> concreteIndexBoosts = new HashMap<>(); | ||
for (SearchSourceBuilder.IndexBoost ib : source.indexBoosts()) { | ||
String[] concreteIndices = | ||
indexNameExpressionResolver.concreteIndexNames(clusterState, searchRequest.indicesOptions(), ib.getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we switch to using the method that returns Index[]
and use the index uuid rather than the index name as key in the map? That will help with the upcoming cross cluster search feature ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@clintongormley maybe you want to have a look at the updated docs? |
Thank you again @javanna. Hopefully changes are sufficient :) |
Docs LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya @masaruh I had one last look. Left a few minors, LGTM otherwise. No need for another review round. Thanks for your patience ;)
this.indexBoost = new ObjectFloatHashMap<>(); | ||
} | ||
this.indexBoost.put(index, indexBoost); | ||
Objects.requireNonNull(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message as second argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (keys[i] != null) { | ||
builder.field((String) keys[i], values[i]); | ||
} | ||
if (indexBoosts != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this ever be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it can rather be empty, hence we always print out the array, which sounds ok to me. I think the null check can go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Removed.
@@ -85,17 +87,19 @@ public ShardSearchLocalRequest(ShardId shardId, String[] types, long nowInMillis | |||
this.nowInMillis = nowInMillis; | |||
this.aliasFilter = aliasFilter; | |||
this.shardId = shardId; | |||
indexBoost = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't matter as explain and validate query apis don't support the indices_boost section right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if we don't set it here, boost is 0. Then, validate query with explain=true
returns parsed queries with boost 0. (Actually, several tests fails if it's not set here)
So, I leave it as is for now. If it's not right, let me know. I'll fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why it's 1 and not 0, and I think it's correcy. my question was rather why we don't pass it in like in the other constructor, I think it is because the apis that use this constructor don't support indices_boost right? just trying to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so.
@@ -1002,15 +977,19 @@ public void parseXContent(QueryParseContext context, AggregatorParsers aggParser | |||
scriptFields.add(new ScriptField(context)); | |||
} | |||
} else if (context.getParseFieldMatcher().match(currentFieldName, INDICES_BOOST_FIELD)) { | |||
indexBoost = new ObjectFloatHashMap<>(); | |||
if (!indexBoosts.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I didn't think this through. The two sections have the same name, so the json wouldn't even be valid if they were both there. I think we should not check it explicitly, otherwise we would have to do it for every single field in our parsers, which we don't do. Relates to #19614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry I misunderstood you. Removed duplicate check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no there was no misunderstanding, I just didn't realize the implications of this check
@@ -173,6 +173,7 @@ | |||
this.timeout = timeout; | |||
queryShardContext = indexService.newQueryShardContext(request.shardId().id(), searcher.getIndexReader(), request::nowInMillis); | |||
queryShardContext.setTypes(request.types()); | |||
queryBoost = request.indexBoost(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can remove the query boost setter and make the instance member final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masaruh I left one single comment, quite important though, on bw comp. Sorry I should have noticed this before.
if (in.getVersion().onOrAfter(Version.V_5_2_0_UNRELEASED)) { | ||
indexBoost = in.readFloat(); | ||
} else { | ||
indexBoost = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we have a problem with backwards compatibility here. Most likely our bw comp tests are going to fail (would be nice to verify that actually). I think indices_boost would get ignored in a mixed version cluster (e.g. 5.x and 5.1):
- 5.x coord node does the resolution on the coord node, serializes search source builder to 5.1 node (nothing changed there) which doesn't know how to resolve indices names yet, hence it reads indices boost from the original source, it works as it used to. That's ok. We can't do better than this.
- 5.1 coord node doesn't resolve indices boost on the coord node, serializes source builder to 5.x node (nothing changed there) which knows how to resolve indice names, but it doesn't get them through shard search request as 5.1 node didn't send the info. Hence the indices boost are always going to be 1, ignored. We should fix this. I think the way to fix it would be to detect based on the version whether we have to read the index boost from the source builder of from the shard search request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, indeed. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually tested and work just like you said.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! fix looks good! did you have some tests out of qa/backwards-5.0
fail without the fix? Or did you just test manually? The backwards tests run the rest tests against a mixed version cluster, so I think your newly added tests should fail in that setup. Can you verify? If so we are good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would also be wise to unit test that newly added bw comp layer. We have something similar in ShardSearchTransportRequest for alias filters (we moved their parsing on the coord node with 5.1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested manually without/with the fix.
I added test similar to alias filters.
5c28135
to
476a246
Compare
@@ -96,7 +96,7 @@ private ShardSearchTransportRequest createShardSearchTransportRequest() throws I | |||
filteringAliases = new AliasFilter(null, Strings.EMPTY_ARRAY); | |||
} | |||
return new ShardSearchTransportRequest(searchRequest, shardId, | |||
randomIntBetween(1, 100), filteringAliases, Math.abs(randomLong())); | |||
randomIntBetween(1, 100), filteringAliases, 1.0f, Math.abs(randomLong())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you randomize the boost here so we also test boost serialization from 6.x to 6.x with a value that's different from default? You could do randomBoolean() ? 1.0f : randomFloat()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Done.
BytesStreamOutput output = new BytesStreamOutput(); | ||
output.setVersion(Version.V_5_0_0); | ||
readRequest.writeTo(output); | ||
assertEquals(output.bytes().toBytesRef(), requestBytes.toBytesRef()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for adding this!
729dce7
to
03428c2
Compare
- match: { hits.hits.1._index: test_1 } | ||
|
||
--- | ||
"Indices boost using array": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluded clients < 5.2.0 for tests that use array format.
This change allows specifying alias/wildcard expression in indices_boost. And added another format for specifying indices_boost. It accepts array of index name and boost pair. If an index is included in multiple aliases/wildcard expressions, the first match will be used. With new format, old format is marked as deprecated. Closes elastic#4756
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice @masaruh LGTM
Thanks @javanna for thorough review! |
Ugh, I should have rebase before pushing merge button... |
no worries @masaruh , and thanks for fixing. |
This change allows specifying alias/wildcard expression in indices_boost.
And added another format for specifying indices_boost. It accepts array of index name and boost pair.
If an index is included in multiple aliases/wildcard expressions, the first match will be used.
Closes #4756